Automatic speaker age and gender recognition using acoustic and prosodic level information fusion
نویسندگان
چکیده
The paper presents a novel automatic speaker age and gender identification approach which combines seven different methods t both acoustic and prosodic levels to improve the baseline performance. The three baseline subsystems are (1) Gaussian mixture odel (GMM) based on mel-frequency cepstral coefficient (MFCC) features, (2) Support vector machine (SVM) based on GMM ean supervectors and (3) SVM based on 450-dimensional utterance level features including acoustic, prosodic and voice quality nformation. In addition, we propose four subsystems: (1) SVM based on UBM weight posterior probability supervectors using the hattacharyya probability product kernel, (2) Sparse representation based on UBM weight posterior probability supervectors, (3) VM based on GMM maximum likelihood linear regression (MLLR) matrix supervectors and (4) SVM based on the polynomial xpansion coefficients of the syllable level prosodic feature contours in voiced speech segments. Contours of pitch, time domain nergy, frequency domain harmonic structure energy and formant for each syllable (segmented using energy information in the voiced peech segment) are considered for analysis in subsystem (4). The proposed four subsystems have been demonstrated to be effective nd able to achieve competitive results in classifying different age and gender groups. To further improve the overall classification erformance, weighted summation based fusion of these seven subsystems at the score level is demonstrated. Experiment results re reported on the development and test set of the 2010 Interspeech Paralinguistic Challenge aGender database. Compared to the VM baseline system (3), which is the baseline system suggested by the challenge committee, the proposed fusion system achieves .6% absolute improvement in unweighted accuracy for the age task and 4.2% for the gender task on the development set. On the nal test set, we obtain 3.1% and 3.8% absolute improvement, respectively. 2012 Elsevier Ltd. All rights reserved.
منابع مشابه
Combining five acoustic level modeling methods for automatic speaker age and gender recognition
This paper presents a novel automatic speaker age and gender identification approach which combines five different methods at the acoustic level to improve the baseline performance. The five subsystems are (1) Gaussian mixture model (GMM) system based on mel-frequency cepstral coefficient (MFCC) features, (2) Support vector machine (SVM) based on GMM mean supervectors, (3) SVM based on GMM maxi...
متن کاملDemographic recommendation by means of group profile elicitation using speaker age and gender recognition
In this paper we show a new method of using automatic age and gender recognition to recommend a sequence of multimedia items to a home TV audience comprising multiple viewers. Instead of relying on explicitly provided demographic data for each user, we define an audio-based demographic group profile that captures the age and gender for all members of the audience. A 7-class age and gender class...
متن کاملA Comparative Study of Gender and Age Classification in Speech Signals
Accurate gender classification is useful in speech and speaker recognition as well as speech emotion classification, because a better performance has been reported when separate acoustic models are employed for males and females. Gender classification is also apparent in face recognition, video summarization, human-robot interaction, etc. Although gender classification is rather mature in a...
متن کاملOn the use of high-level information in speaker and language recognition
Automatic Speaker Recognition systems have been largely dominated by acoustic-spectral based systems, relying in proper modelling of the short-term vocal tract of speakers. However, there is scientific and intuitive evidence that speaker specific information is embedded in the speech signal in multiple shortand long-term characteristics. In this work, a multilevel speaker recognition system com...
متن کاملAutomatic discrimination between laughter and speech
Emotions can be recognized by audible paralinguistic cues in speech. By detecting these paralinguistic cues that can consist of laughter, a trembling voice, coughs, changes in the intonation contour etc., information about the speaker’s state and emotion can be revealed. This paper describes the development of a gender-independent laugh detector with the aim to enable automatic emotion recognit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Computer Speech & Language
دوره 27 شماره
صفحات -
تاریخ انتشار 2013